Structure Cognizant Pseudo Relevance Feedback
نویسندگان
چکیده
We propose a structure cognizant framework for pseudo relevance feedback (PRF). This has an application, for example, in selecting expansion terms for general search from subsets such as Wikipedia, wherein documents typically have a minimally fixed set of fields, viz., Title, Body, Infobox and Categories. In existing approaches to PRF based expansion, weights of expansion terms do not depend on their field(s) of origin. This, we feel, is a weakness of current PRF approaches. We propose a per field EM formulation for finding the importance of the expansion terms, in line with traditional PRF. However, the final weight of an expansion term is found by weighting these importance based on whether the term belongs to the title, the body, the infobox or the category field(s). In our experiments with four languages, viz., English, Spanish, Finnish and Hindi, we find that this structure-aware PRF yields a 2% to 30% improvement in performance (MAP) over the vanilla PRF. We conduct ablation tests to evaluate the importance of various fields. As expected, results from these tests emphasize the importance of fields in the order of title, body, categories and infobox.
منابع مشابه
Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملPresentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملRecurrent Pseudo Relevance Feedback on Web Collections
Various Relevance Feedback techniques exist in Information Retrieval such as Simulated Relevance Feedback and Pseudo Relevance Feedback. In a Simulated Relevance Feedback technique a new query is reformulated based on the documents selected by the user from the top-ranked documents whereas in a Pseudo Relevance Feedback, the query is reformulated based on the assumption that N top-ranked docume...
متن کاملRutgers Information Interaction Lab at TREC 2005: Trying HARD
Within the structure of the TREC 2005 HARD track guidelines, we investigated the following hypotheses: H1: Query expansion using a “clarity”-based approach will increase effectiveness over baseline queries and baseline queries plus pseudo-relevance feedback; H2: Query expansion based on the Web will increase effectiveness over baseline queries and baseline queries plus pseudo-relevance feedback...
متن کاملPseudo Relevance Feedback Method Based On Taylor Expansion Of Retrieval Function In NTCIR-3 Patent Retrieval Task
Pseudo relevance feedback is empirically known as a useful method for enhancing retrieval performance. For example, we can apply the Rocchio method, which is well-known relevance feedback method, to the results of an initial search by assuming that the top-ranked documents are relevant. In this paper, for searching the NTCIR-3 patent test collection through pseudo feedback, we employ two releva...
متن کامل